Second HAREM: Advancing the State of the Art of Named Entity Recognition in Portuguese

نویسندگان

  • Cláudia Freitas
  • Cristina Mota
  • Diana Santos
  • Hugo Gonçalo Oliveira
  • Paula Carvalho
چکیده

In this paper, we present Second HAREM, the second edition of an evaluation campaign for Portuguese, addressing named entity recognition (NER). This second edition also included two new tracks: the recognition and normalization of temporal entities (proposed by a group of participants, and hence not covered on this paper) and ReRelEM, the detection of semantic relations between named entities. We summarize the setup of Second HAREM by showing the preserved distinctive features and discussing the changes compared to the first edition. Furthermore, we present the main results achieved and describe the available resources and tools developed under this evaluation, namely, (i) the golden collections, i.e. a set of documents whose named entities and semantic relations between those entities were manually annotated, (ii) the Second HAREM collection (which contains the unannotated version of the golden collection), as well as the participating systems results on it, (iii) the scoring tools, and (iv) SAHARA, a Web application that allows interactive evaluation. We end the paper by offering some remarks about what was learned.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

HAREM: An Advanced NER Evaluation Contest for Portuguese

In this paper we provide an overview of the first evaluation contest for named entity recognition in Portuguese, HAREM, which features several original traits and provided the first state of the art for the field in Portuguese, as well as a public-domain evaluation architecture.

متن کامل

HAREM and Klue: how to compare two tagsets for named entities annotation

This paper describes an undergoing experiment to compare two tagsets for Named Entities (NE) annotation. We compared Klue 2 tagset, developed by IBM Research, with HAREM tagset, developed for tagging the Portuguese corpora used in Second HAREM competition. From this report, we expected to evaluate our methodology for comparison and to survey the problems that arise from it.

متن کامل

A Golden Resource for Named Entity Recognition in Portuguese

This paper presents a collection of texts manually annotated with named entities in context, which was used for HAREM, the first evaluation contest for named entity recognizers for Portuguese. We discuss the options taken and the originality of our approach compared with previous evaluation initiatives in the area. We document the choice of categories, their quantitative weight in the overall c...

متن کامل

A Complex Evaluation Architecture for HAREM

In this paper we briefly describe the evaluation architecture and the measures employed in HAREM, the first evaluation contest for named entity recognition in Portuguese. All programs are publically available for experimentation.

متن کامل

Boosting Named Entity Recognition with Neural Character Embeddings

Most state-of-the-art named entity recognition (NER) systems rely on handcrafted features and on the output of other NLP tasks such as part-of-speech (POS) tagging and text chunking. In this work we propose a language-independent NER system that uses automatically learned features only. Our approach is based on the CharWNN deep neural network, which uses word-level and character-level represent...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010